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In this paper, the forgetting of the initial distribution for a non- 



, ergodic Hidden Markov Models (HMM) is studied. A new set of con- 



ditions is proposed to establish the forgetting property of the filter, 
which significantly extends all the existing results. Both a pathwise- 
type convergence of the total variation distance of the filter started 
from two different initial distributions, and a convergence in expecta- 
tion are considered. The results are illustrated using generic models 
1 of non-ergodic HMM and extend all the results known so far. 



1. Introduction and notations. A Hidden Markov Model (HMM) is 
a doubly stochastic process with an underlying Markov chain that is not 
directly observable. More specifically, let X and Y be two spaces equipped 
00 | with countably generated cr-fields X and y; denote by Q and G respectively, 

a Markov transition kernel on (X, X) and a transition kernel from (X, X) to 
(Y, 3-0- Consider the Markov transition kernel defined for any (x,y) € X x Y 
; and C e X ® y by 

(1) Tl(x,y),C] d ^ Q®G[(x,y),C] = [[ Q(x,dx') G(x' ,dy')l c (x' ,y' 



We consider {X^, Yk}k>o the Markov chain with transition kernel T and ini- 
tial distribution v<giG(C) = f // u(dx)G(x, dy)lc(x, y), where v is a probabil- 
ity measure on (X, X). We assume that the chain {Xk}k>o is n °t observable 
(hence the name hidden). In addition, we assume that there exists a measure 
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H on (Y, y) such that for all x £ X, G(x, •) is absolutely continuous with 
respect to fj,; under these assumptions, the joint transition kernel T may be 
expressed as 

(2) T [(x, y),C]= If Q(x, dx')g(x', y>) l c (x', y')fi(dy') , CeX®y, 



where g(x,-) = dG ^''^ denotes the Radon-Nikodym derivative of G(x, •) with 
respect to /i; g(x, •) is referred to as the likelihood of the observation. We 
denote by 4> u ,n[yo:n] the distribution of the hidden state X n conditionally on 
the observations yo- n = f [yo, ■ ■ ■ , Un], which is given by 

(3) <pu,n m.n \{A) = — — — — — -— 

v [g{-,yo)Qg{-,yijQ ■ ■ ■ Qg{; y n )\ 

_ | X n+i v{dx )g{x Q , y ) Ui=i Q{xi-i,dxi)g(xi,yi)t A (xn) 
/ xn+ i u(dx )g(x ,y ) Ui=i Q(xi-i,dxi)g(xi,yi) 

where Qf(x) = Q(x,f) = f / Q(x,dx')f(x'), for any function / G B + (X) the 
set of non- negative functions / : X — > K, such that / is X/B(M) measurable, 
with B(M) the Borel a-algebra. Let P*) be a probability space and 

{Yk}k>o be a Y- valued stochastic process defined on (O, T\ 

A typical question is under which conditions the distance between the 
filtering measures (pu^n and n for two different choices of the initial dis- 
tribution v and v 1 vanishes, i.e. 

lirn^ ||0 v ,n[^b:n] - 0i/,n[^O:n]|| T v = P * ~ a - s - ' 

where ||-|| TV denotes the total variation norm. We stress that {lfc}fc>o is not 
necessarily itself the observation sequence associated to the HMM used to 
define the sequence of filtering distribution, which means that we are inter- 
ested in studying the forgetting property of the initial condition even when 
the model is mis-specified, which happens to be often the case in practical 
settings. The forgetting property of the initial condition of the optimal fil- 
ter in nonlinear state space models has attracted many research efforts; see 
for example the in-depth tutorial of [5] . The brief overview below is mainly 
intended to allow comparison of assumptions and results presented in this 
contributions with respect to those previously reported in the literature. 

The filtering equation can be seen as a positive random non-linear oper- 
ator acting on the space of probability measure; the forgetting property can 
be investigated using tools from the theory of positive operators, namely 
the Birkhoff contraction inequality for the Hilbert projective metric (see [JJ, 
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[14j , [13] ) . The results obtained using this approach require stringent mixing 
conditions for the transition kernels; these conditions state that there exist 
positive constants e_ and e+ and a probability measure A on (X, X) such 
that for / G B+(X), 



This condition in particular implies that the chain is uniformly geometri- 
cally ergodic. Similar results were obtained independently by [6] using the 
Dobrushin ergodicity coefficient (see [7] for further refinements under this 
assumption). The mixing condition has later been weakened by [4], under 
the assumption that the kernel Q is positive recurrent and is dominated by 
some reference measure A: 



where q(x, ■) = ^\ > essinf is the essential infimum with respect to A and 
irdX is the stationary distribution of the chain Q . If the upper bound is 
reasonable, the lower bound is restrictive in many applications and fails to 
be satisfied e.g. for the linear state space Gaussian model. 

In [14] . the stability of the optimal filter is studied for a class of ker- 
nels referred to as pseudo-mixing. The definition of pseudo-mixing kernel is 
adapted to the case where the state space is X = R , equipped with the Borel 
sigma-field X. A kernel Q on (X, X) is pseudo-mixing if for any compact set 
C with a diameter d large enough, there exist positive constants e_(d) > 
and e + (d) > and a measure Ac (which may be chosen to be finite without 
loss of generality) such that 

(5) e_(d)A c (A) < Q(x,A) < e + (d)X c (A) , for any x € C, A € X 
This condition implies that for any (x',x") & C x C, 

— y-f- < essinf x€ xQ (x' , x)/q(x", x) < esssup^xQ^'i x )/q( x "i x ) < + \ 1 > 



where q(x,-) = dQ{x,-)/d\Q, and esssup and essinf denote the essential 
supremum and infimum with respect to Ac- This condition is obviously 
more general than @, but still it is not satisfied in the linear Gaussian case 
(see [HI Example 4.3]). 

Several attempts have been made to establish the stability conditions 
under the so-called small noise condition. The first result in this direction 



(4) 



e-A(/) < Q(x, f) < e+X(f) , for any x e X . 
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has been obtained by [I] (in continuous time) who considered an ergodic 
diffusion process with constant diffusion coefficient and linear observations: 
when the variance of the observation noise is sufficiently small, pQ established 
that the filter is exponentially stable. Small noise conditions also appeared 
(in a discrete time setting) in [3] and [15]. These results do not allow to 
consider the linear gaussian state space model with arbitrary noise variance. 

A very significant step has been achieved by [12], who considered the 
filtering problem of Markov chain {X k } k >o with values in X = M. d filtered 
from observations {Y k } k > in Y = M 1 , 



Here {(( k , £fc)}fc>o is a i.i.d. sequence of standard Gaussian random vectors 
in b(-) is a (i-dimensional vector function, a(-) a d x cf-matrix function, 

h(-) is a ^-dimensional vector-function and (3 > 0. The authors established, 
under appropriate conditions on b, h and <r, that the optimal filter forgets 
the initial conditions; these conditions cover (with some restrictions) the 
linear gaussian state space model. 

A new approach for ergodic HMM using the so-called Local Doeblin prop- 
erty is proposed in [8]. Both almost sure convergence and convergence in 
expectation for the distance in total variation norm for two filters with 
different initial distributions are proven. The results hold under weaker con- 
ditions than those appearing under other mixing assumptions and, in par- 
ticular, cover the linear Gaussian state-space model. Moreover, assumptions 
on observations are relaxed and convergence theorems apply for sequences 
which are not necessarily HMM. 

The works mentioned above mainly deal with ergodic HMM, i.e. the sit- 
uations in which the hidden Markov chain is ergodic. Non-ergodic HMM 
models are routinely used in the non-linear filtering literature, many models 
used for example in tracking or financial econometrics being simply random 
walks (see [9] and |16] and the references therein). Non-ergodic HMM have 
been considered much less frequently in the literature. The main references 
in this direction are [3] and [15]. In [3], the observation process is the signal 
(state) corrupted by an additive white noise of sufficiently small noise vari- 
ance. In [15] . the authors also assumed that the observation is a possibly 
non-linear function of the signal (satisfying some additional technical con- 
ditions) and that this function of the signal is also observed in an additive 
noise model of sufficiently small variance. The authors propose to truncate 
the Markov kernels on random sets depending on the observation sequences, 
which are chosen in such a way that the truncated kernels satisfy mixing 



(6) 



( 



X k+1 =X k + b(X k ) + a(X k )C k , 
Y k = h(X k ) + (3e k . 
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conditions. The authors establish the convergence of the first-order moment 
of the difference under signal-to-noise ratio condition. 

In this contribution, we propose a new set of conditions to establish the 
forgetting property of the filter, which are more general than those proposed 
in [3] and [H]. In Theorem El the convergence of the total variation distance 
of the filter started from two different initial distributions is established, and 
is shown to hold almost surely w.r.t. the probability distribution of the ob- 
servation process {Yk}k>o- Then, in TheoremEl a bound for the expectation 
of this total variation distance is obtained and used in Section [3] for non- 
linear state-space models to obtain a geometric rate. The results are shown 
to hold under rather weak conditions on the observation process {Yfc}fc>o 
which do not necessarily entail that the observations are produced by the 
filtering model. 

The paper is organized as followed. In section we introduce the assump- 
tions and state the main results. In section [31 nonlinear state-space models 
are considered with different kind of dependence for the state noise and with 
observations not necessarily produced by the model defining the filter. The 
proofs are given in sections HI El 

2. Main results. In this section, we present two theorems on the for- 
getting properties of the optimal filter. These results require the choice of 
a set-valued function, referred to as Local Doeblin set function, which ex- 
tends the so-called local Doeblin sets introduced in [19] and later exploited 
in [12]. The difference between LD-sets of [19] and LD-set functions lies in 
the dependence on the successive observations. 

Definition 1 (LD-set function ). A set-valued function C : y i — ► C(y) 
from Y to X is called a Local Doeblin set function (LD-set function) if there 
exists a map (y,y') i — ► (eq (y, y'), s^iy, y')) fromYxV to (0, oo) 2 such that, 
for all (y, y') € Y x Y, there exists a measure A^y on (X, X) satisfying 

(7) eZ{y,y')\ y , yl [AnC{y')] < Q[x , A n C(y')\ < e+(y,y')\ y , y ,[AnC(y')] 

for all x £ C(y) and A 6 X . 

Some general conditions on the Local Doeblin set function involving the 
distributions of the observations ensure the forgetting property of the opti- 
mal filter. The case of nonlinear state-space models is studied in Section [3l 
Roughly speaking, inequality ([7]) means that the transition of the hidden 
chain, when the state is in a given subset C(y) does not depend too much 
on the current state. 
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We denote, for a set A G X and an observation y G Y, the supremum of 
the likelihood over A, 

(8) Y A (y) d = sup g(x,y) . 

Consider the following assumptions on the likelihood of the observations : 
(HI) For all (x,y) € X x Y, g(x,y) > 0. 

(H2) For all i] > 0, there exists an LD-set function satisfying, for all 

(9) T C c (y) (y)<r,T x (y). 

The first condition states that the likelihood is everywhere positive. This 
excludes the case of additive noise with bounded support; see for example 
[2]. When X = M, d , the second assumption is typically satisfied when, for 
any given y, the likelihood goes to zero as the state |sc| goes to infinity: 
lim^^sc g(x,y) = 0. This assumption is satisfied in many models of prac- 
tical interest, and roughly implies that the observation effectively provides 
information on the state range of value. 
For a given LD-set function C , we set 

g(.-,y)Qg{-,y')ic( y '){-) , 

The main idea of the proof is that the states belong very often to the 
LD-sets. Every time the state is in a LD set and jumps to another LD set, 
the forgetting mechanism comes into play. From now on, for all (x, x') € X 2 , 
denote by x, = (x,x') the product g(x,y) = g(x,y)g(x' ,y). Similarly, for 
all A € X, denote A = A x A, for all LD-set function C, C the set-valued 
function C(y) = C(y) x C(y). Finally, for all (x,x ! ) G X 2 , and A, B G X, set 
Q(x,x',A x B) = Q(x,A)Q(x',B). Then, for any A G X and v and v' two 
probability measures on (X,X), the difference 4>u,n[yo:n](A) — <ftv> , n [yo:n\(A) 



(io) * v ,M) = v 



(ii) *cM) = \ >y > 
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may be expressed as 
(12) 

[y :n](A) - <J) }(A) 



E? m*9(XuVi)] 



^>[W=o9(Xi,yi)] 



E?^ [iyi =0 g(X uyi )l A (X n )] -Eg^ [n" =0 g(^,y«)U(^n)] 

e? [nr=o5(^,yi)]E? [nr^ff^,^)] 
^ [nr= g(^,2/i){iA(^n) - jural] 

E? m^giXuVi)]®*} \nto9(Xi, yi )} ' 



(13) 



We compute bounds for the numerator and the denominator of the previous 
expression. Such bounds are given in the two following Propositions (proofs 
are postponed to Section d]). For an LD-set function C denotes: 



(14) 



Proposition 2. Let C be an LD-set function and v and v' be two prob- 
ability measures on (X, X). For any integer n and any sequence {yi}f =0 in 
Y, let us define 
(15) 



A n (l/,l/,2/0:n) = SUp 

Then, 



E 



Q 



Y[g(Xi,yi)l A (X n 

i=0 



E 



Q 



Y[g(Xi,yi)l A (X n ) 

i=0 



A n (l y , V , yO:n) — ^v®v' 

\g( x o,yo)Y[g{Xi,yi)p^ {yi _ uyi) \ , 



8=1 



where S t = Iq^xc^PQ-i,^)- 

Proposition 3. Let C be an LD-set function and {yi}f = o o sequence in 
Y. We have for all n £ N 



E? 



i=0 



\i=2 



i=2 



By combining these two Propositions, we obtain an explicit bound for 
the total variation distance ||0^, n [yo:n] ~~ 0i/,n[z/O:n]||TV ^ * s worthwhile to 
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note that the bound we obtain is valid for any sequence yo :n and any initial 
distributions v and v' . To state the result, some additional notations are 
required. Under assumption (H2J), for a fixed rj > and a corresponding 
LD-set function C v , let us define, for a E (0, 1) and a sequence yo m = {yj}f =0 
in Y, 
(16) 

d f f n n 1 

K,(yo:n,a) = max [] p 5 v k {y k -\,yk), {h}l=i e {0,l} n : 5^(5 fc >ra , 

U=i fe=i J 

where /c^ is a shorthand notation for pr (see (fl4"|) ) 

Theorem 4. Lez; a &e some number in (0, 1), z/ and v' some probability 
measures on (X, X) and {yi}f =0 a sequence in Y. Then, 

(17) ||^, n [yo :n] 4>v' ,n [yO:n]|| TV < A r) (y :n,a) + 

n 2 n 

^11 (ecCw-i.fO^cd/i-i, Il T x(^)^c(yo,yi)^c(yo,yi) , 

i=2 i=0 

iwiift a n = ^ 1 ~ 2 ") n j , 

Proof. The expression (|12p together with Proposition [2] imply 

A n (l/, v',yo:n) 



\<f>v,n[yO:n] ~ ^,»W Lv < 



where A n (z/, z/, yo:n) is defined by (fT5j) . Set 

n n—l 

^c,n = E ^-i G Cfoi-i)}!^ E C(y,)} , Mcc >n = E lcc (w )(^) • 

For any sequence {tij}, such that Uj S {0, 1} for j S {0, . . . , n} and uj = 
for j >n, 

n—l n—l n n—l 

n > E u i v n '+i = E ( n ? + - UiU i+ i) > Ui - 1 - E ^j^j+i > 

i=0 i=0 i=0 i=0 

which implies that J27=o u i — ( n + ^)/^ + J27=o u i u i+i - Using this inequality 
with Ui = l{Xi E C(yi)} for i E {0, . . . , n} shows that Nq h < an implies 
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that Mq c n > a n . Therefore, using Proposition [2J we obtain 
(18) 



g{X ,yo) W_g{Xi,yi)p^{yi^yi)l{N^ n > an} 



i=l 



g{Xo,yo)Y[g(X i ,y i )p^(y i _ 1 ,y i )l{N^ n < an} 



i=i 



with 5i = t^y.^^xQ^y.^Xi-i, Xi). The last term in the right-hand side of 
(fl8l) satisfies 



E 



Q 



g(X ,y )Y[g(X i ,y i )p v '(y i ^ 1 ,y i )t{N^ n < an} 



i=l 



\g{X uyi )l{M- CC n > a n } 



Li=Q 



By splitting this last product, we obtain 

n 

]\g{Xi,yi)l{M^ in >a n } 

i=0 

= J] g{X uyi )l{M tCjn >a n }x ]J g{X hyi )t{M tc>n > a n } , 

_0<i_<n, 0<i<n, 

<V a "X LI T X(^)X II T X(^)> 

_0<i<n, 0<i_<n, 

which implies E% u , [U7= g(Xi,yi)l{M t ^ n > a n }} < v a,l U?=o ^liVi)- The 
first term in the right-hand side expression of (| 18|) satisfies 



g(X ,y )Y[g(X i ,y i )Y[p s r ;(y i _ 1 ,y i )l{N^ n > an} 

i=l 



i=l 



Y[g(Xi,yi) 

Li=0 



A v (yO:n,a) . 



By combining the above relations, the result follows. 



□ 



The last step consists in finding conditions upon which the bound in the 
right hand side of (|17p is small. This bound depends explicitly on the ob- 
servations y's; it is therefore not difficult to state general conditions upon 
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which this quantity is small. Let {Yfc}fc>o be a stochastic process with proba- 
bility distribution P* in (Y,y), which is not necessarily related to the model 
under which the filter is computed. We first formulate an almost sure conver- 
gence on the total variation distance of the filter initialized with two different 
probability measures v and v' and then later establish a convergence of the 
expectation. 

Theorem 5. Assume (Hjty and (1^). Assume moreover that there exists 
some LD-set function C such that 



(19) liminf n" 1 V] loge^(Yfe_i, Yfc) > — M, P* - a.s. 

k=2 

n 

(20) lim supra -1 ^ logT x (lfc) < M, P* - a.s. 



fc=0 



(21) lwiMn- 1 J2log-Vc(Y k -i,Y k )>-M, P* - a.s. 



n— too 

k=2 



for some constant M > 0. Assume in addition that, for all r\ > and 
a €(0,1), 

(22) limsupn~ 1 logA r) (y :n,a) < 0, P* - a.s. 

n— >oo 

Then, for any initial probability distributions v and v' on (X, X) such that 
vQlcpi) > , P* - a.s. f'Qlc(Yi) > , P* - a.s. 

we have 

lim supra -1 log ||c/v in [Yb:n] - <!>u> ,n[Yo-.n) || TV < °) a - s - 

n— »oo 

Proof of theorem [21 Under the stated assumptions, there exists a 
LD-set function C and some constant M > such that 



limsupexp(-2Mn) J| (e£ (3^_ l5 Yj)) < 1 , P* - a.s. 

n 

limsupexp(-2Mn) JJ T x (Fj) < 1 , P* - a.s. 
n-oo . =Q 

n 

limsupexp(-2Mra) #c 2 (li_i, >;) < 1 > p * ~ a - s - 



i=2 
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Let a be some number in (0, 1). Since a n = ^ 2 "^ n + o(n), by choosing rj 
small enough, it follows that 



n— >0O ■ n • n 

1=2 1=0 

< limsup7? a "e 6Mn < limsupe" cn 

n— >oo n—*oo 

for some c > 0. The proof is concluded by using inequality and (|22|) . □ 



The assumptions linking the LD-set function and the observations make 
this theorem quite abstract. With a filtering model defined by specific equa- 
tions, assumptions can be directly formulated on the model and on the 
observations. Such situations will be described through examples presented 
in Section [3l 

Compared to [HJ Theorem 1 ] in the ergodic case , the conditions (]19p and 
(|22p are specific to the non-ergodic case, since they involve the functions Eq 
and Eq. In the ergodic case, these functions are constant and assumptions 
(|19p and (|22p are trivially satisfied. 



Theorem 6. Assume (L[7$) and (U^. Let C be a LD-set function. Then, 
for any Mi > 0, i = 0, . . . , 3, 5 > and a € (0, 1), there exist constants 
r] > and P € (0, 1) such that, for all n € N ; 

4 

(23) E*[||<^ n [y :n] -<f>v',n[Yo:n]\\ TV ] < W n +r (u, n) +r (z/, n) n(n) 

i=l 



where the sequences in the right-hand side of (|23p are defined by 
(24) r (is,n) d ^ P* (log<I>„c(Yo, Y x ) < -M n) , 



(25) n(n) d = P* ( 5^1og£c(lfc-i,n) < -Mi 



n 



\k=2 



(26) r 2 (n) d = P A £ log T x (Y k ) > M 2 n 



\k=0 



(27) r 3 (n) = Pj £ log ttcp-fc-i, ifc) < -M 3 n , 

\fc=2 / 

(28) r 4 (n) d = P* (log A v (Y 0:n , a) > -Sn) . 

The proof is along the same lines as above and left to the reader. This 
result does not provide directly a rate of convergence. Indeed, only the first 
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term of the right-hand side of equation (|23|) gives a geometric rate. In Section 
(3] for given filtering equations, explicit majorations of the other terms will 
be obtained with geometric rates. Like for the pathwise convergence, the 
terms r\ and which involve the functions Eq and Eq are specific to the 
non-ergodic case. 

3. Nonlinear state-space models. Let X = R n and Y = W with 
p < n, endowed with the Borel u-algebra X and y. We consider the model: 



where / and h denote some measurable functions. The observation noise 
{ £ k}k>o is a sequence of i.i.d. random variables with positive density v with 
respect to the Lebesgue measure A Leb on Y. We consider the following as- 
sumptions: 

(El) / is a-Lipschitz, i.e. \f{x) — f(y)\ < a\x — y\ and h is uniformly con- 
tinuous and surjective and, for all y±, yi G Y and xi, X2 € X in the 
preimage of y\ and 2/2 j there exist constant bo and b such that, 



(E2) The density v is bounded, and lim| u i_ >00 v(u) = 0. Moreover, for all 
compact set K C Y, the quantity inf^gK v(y) is positive. 

Notice that / is not necessarily contracting so that the model is possibly 
non-ergodic. The assumption (EH]) has been first considered in [15]. A func- 
tion / satisfying (HI]) can be viewed as a perturbation of a bijective function 
whose inverse is 6-Lipschtiz. The rationale for considering such assumption 
is the following. For two successive observations y\, yi € Y, the distance 
between inverse images of y\, yi can not be arbitrarily large. Even if h is 
not bijective, the distance \y± —yi\ gives information on the distance of two 
successives preimage states. The assumption (E2J) is more classical and is 
satisfied, for example, by Gaussian densities. 

We first consider the simplest situation where the state noise is a sequence 
of i.i.d. random variables independent of the observation noise {Ek}k>o an d 
the observations are distributed according to the model. Then, we study 
more general dependence structure of the state noise distribution and the 
case where the observations do not necessarily follow the model. 



(29) 




x\ - x 2 \ <b + %i - y 2 
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3.1. Nonlinear state-space model with i.i.d. state noise. In this section, 
we assume that the state noise {Cfe}fc>o is a sequence of i.i.d. random vari- 
ables with positive density 7 with respect to the Lebesgue measure denoted 
A Leb and independent of the observation noise {sk}k>o- Then, for any A G X, 

(30) Q(x,A)= [ j[x'-f(x)]^ eh (dx'). 

J A 

For any A G (0, 00), let us define the following set- valued function from Y 
to X by 

(31) y — » C(y, A) <M {x G X : \h(x) -y\<A}. 

For any y G Y, C(y, A) is included in a neighborhood of the preimage of y. 
Indeed, under assumption (EUJ), for any z G X in the preimage of y, and any 
x G C(y,A), 

|as — zj < 60 + 6A . 

Let (y,y') G Y 2 . By the condition (EH]), /1 is surjective so the preimage of y 
and y' by /i is non empty. We choose arbitrarily z and z' in these preimages: 
y = f(z) and y' = f(z'). By the triangle inequality and the condition (EUJ), 
it follows that, for all (x,x') G C(y, A) x C(y', A), 
(32) 

|/(x)-x'| < |/( x )-/( z )|+|/( z )_ z '[ + [a/_ x '| < a(6 +6A)+ J D(y,y / )+feo+6A , 
where D is defined by 

(33) D(y,y') d ^ sup{|/(z) - z'| : (z,z') G X 2 with h(z) = y, h(z') = y'} . 

For any r > 0, we consider the minimum and the maximum of the state 
noise density over a ball of radius r: 

(34) 7~(r) d = inf 7(5) , 7 + (r) d = sup 7(5) , 

\ s \< r \s\<r 

It follows from flU and ([32j) that, for all A G and x G C(y, A), 
(35) 

e- A (y,y')\ Lcb [AnC(y',A)] < Q[x , AnC(y' , A)} <e+(y,y')X Lch [AnC(y',A)] , 
where, 

el(y, 2/0 = 7" [(a + l)6o + (a + 1)6A + D(y, j/)] , 
4(2/= V') = 7 + [(a + l)6o + (a + 1)MA + £>(y, y')] • 
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Since 7 is a positive density, it follows by (135j) that the application defined 
by (|3ip is a LD-set function. By assumption (E[2j) , for all rj > 0, we may 
choose A large enough so that sup| s | >A t> (s) < r/sup sgX v (s), which implies 
that assumption (H2J) 

(36) T C c {yA) {y) < rfC x {y) , 

is satisfied. The positiveness of v implies assumption (HI])- 

To check assumptions (jl9h and (|22|) . it is required to compute an upper 
bound for {D(Y k _i, Y k )} k >i. For z, z'eX such that = Yfc_i, /i(-z') = 
Y k , it follows from the triangle inequality and assumption (Fjl]) that 

\f(z)-z'\ < \f(z)-f(X k _ 1 )\ + \f(X k _ 1 )-X k \ + \X k -z'\, 
< o(6o + 6|e fc _i|) + |Cik| + 6o + 6|efc|. 

Therefore, for all integer k > 1, 

(37) D(y fc _ l5 y fc ) < (a + l)6 + afe|£fc-i| + |Cfc| + fe|efc| ■ 

Thanks to this bound, assumptions (|19p and (|22|) are satisfied by applying 
the Law of Large Numbers, see Propositions [7] and [9] and their proofs. Since 
7~ is a non increasing function, it follows by (|37j) that, for all integer k > 1, 
loge^(Yjfc_i, Ifc) < — where for all A > and all integer fe > 1, 

(38) Z fe Ad = f -log7-[2(a + l)6 + (a+l)6A + a&|^._ 1 | + |C fc | + 6|e fc |] • 



Proposition 7. Let us consider the filtering model defined by (|29[) . As- 
sume ($E) and, for all A > 0, 

(39) E|Zf I < 00 . 

Lei {lfc}fc>o ^ e ^ e sequence of observations produced by the filtering equa- 
tions (|29p and let C 6e i/ie LD-set function defined by (I31j) . Then, for any 
initial probability distributions v and v' on (X, X) and A > such that 

K?lc(Yi,A) > , P* - a.s. ^'Qlc(Yi,A) > , P* - a.s. 
we /i<roe 

limsupn _1 log||^ n [yo:n] - 02/',n [^0:n] || tv < 0, P* - a.s. 
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The condition (I39j) . is not very restrictive. For example, let us assume 
that 7 is a centered Gaussian density and that {Cfc}fc>o an d {£k}k>o are 
sequences of Gaussian random variables. It follows, that "f~(r) = j(r) for 
all r > 0. The condition ([39]) holds if E(|ei| 2 ) < oo and E(|Ci| 2 ) < oo which 
are trivially satisfied. 

With more stringent conditions for initial laws, geometric rates hold for 
the convergence of the expected value of the total variation. Let us recall 
the definition of the log-moment generating function that will be used in the 
sequel. 

Definition 8. The log-moment generating function ipz(ty of the ran- 
dom variable Z is defined on the set {A > : E[e A ^] < oo} by V'z(A) = f 
logE[e AZ ]. 

Proposition 9. Let us consider the filtering model defined by ([29]) and 
satisfying (J^), (l^B) and, for all A > 0, there exists r > such that 

(40) ^zf- is finite on [0, r). 

Let {Yk}k>o be the sequence produced by the filtering equations (|29ft and 
let C denotes the LD-set function defined by (|3ip . Then, for v and v' two 
probability measures on (X, X) and A > such that, for some A > 0, 

(41) E* {exp (A[log vg{-, *o)Ql c( y 1)A) ]_) } < oo , 

E*{exp (A[log^(-,y )Qlc(y 1 ,A))]-)} < oo , 

we have 

limSUpn _1 logE^ [||^ n [lo:n] ~ ^ v ',n\Xo:n] || TV ] < ■ 
n— >oo 

Assume that 7 is the density of a standard Gaussian random variable. 
The condition E[e A2 i ] < 00 is equivalent to 



exp 

n + 2p 



(A — ?)|x| 2 dx < 00 , 



where ? denotes some positive constant. Therefore, for A > small enough, 
the condition ([4TI|) is satisfied. 

The conditions (|4ip can be interpreted as non-degenerative conditions. 
Indeed, they forbid that vg(-, ^o)Qlc(ii,A) is nun almost everywhere and 
the same for v' . Intuitively, it means that the distribution of the random 
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variable vg(-, Yo)Qlc(Y 1 ,A) i s n °t concentrated close to zero. For example, if 
there exists a constant c > such that 

V9(; i o)Qlc(Yi,A) > c > P* - a - s ^'ff (•> *o)Qlc(Yi,A) > c . IP* - o-s 

then the conditions of (I4ip are satisfied. Proofs of Propositions [7] and [9] are 
given in Section [5j 

3.2. Nonlinear state-space model with dependent state noise. We now 
consider the case where the state noise {Cfc}fc>o can depend on previous 
states. This model has been introduced in [15\ Section 3] and is important 
because it covers the case of partially observed discretely sampled diffu- 
sion, as well as partially observed stochastic volatility models j3j Section 2]. 
This example illustrates that the forgetting property is kept even when the 
distributions of the observations differ from the model. 

(G) {Ck}k>o is a sequence of random variables such that, for all integer k, 
Qk is independent of Ek and for all A G X , 

F(Ck G A\X k _ 1= x) = J q(x,u)t A (u)\ Lcb (du) . 

Moreover, there exist a positive probability density tp and positive con- 
stants fi~ , such that, for all x, u G X, 

fj,~ip(u) < q(x,u) < fi + ijj(u) . 

A first example of state equation satisfying (G) is considered in [3J. A signal 
takes its values in X and follows the equation 

(42) X k = f(X k - 1 ) + <r(X k - 1 )Z k , 

where {£k}k>o is a sequence of i.i.d random variables and where a : X — > 
M nxn is a measurable function that satisfies, for all x, u G X, the following 
hypoellipticity condition: 

(43) cr~h| 2 < (u,a(x)a T (x)u) < a + \u\ 2 , 

where o~~ , cr + are positive constants and the superscript T denotes the trans- 
position. Another important example where (G) is satisfied is the case of 
certain discretely sampled diffusions. Let (Xt)t>o be the unique solution of 
the following stochastic differential equation 

dX t = p{X t )dt + a(X t )dB t , 
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where B is the ra-dimensional Brownian motion and the functions p : M n — ► 
M. n and a : M. n — > ~R nxn are respectively of class C 1 and C 3 . Then, the 
sequence {Xk}k>o satisfies assumption (G) if the function a is hypoelliptic 
(condition (03])); see |15j . The assumptions (EH]), (EE]) and (G) are a bit 
more stringent that those made in [15] . Indeed, in [15], the function h is not 
necessarily uniformly continuous and no restrictions are made on v. This 
allows to establish the forgetting of the initial condition with probability 
one without restriction on the signal-to-noise ratio and for sequences of 
observations which are not necessarily distributed according to the model 
used to compute the filtering distribution. Let us denote by Q the transition 
kernel for {X^^q. Then, for all A G X and for all x G X, 



Q(x,A)= / q[x,x' - f(x)]X Lch (dx') 



i Leb / 

I A 

For the same reasons as above, we consider the same set-valued function 
C JHH) as before. Let (y,y') € Y 2 . Like in ([32}, it follows by (EH]) and the 
triangle inequality that, for all (x,x') G C(y, A) x C(y',A), 

\f(x)-x'\<c + dA + D(y,y') , 

where D is defined in (]33[) . c = (a + 1)6q and d = (a + 1)6. By setting 

(44) Q~(r) == x inf ip(v) , q + (r) = f /i + x sup , 

\v\< r \v\<r 

it follows from condition (G) that, for all A G X and x G C(y, A), 
(45) 

eK(y,y')A Lc VnC(y', A)] < AnC(y', A)] < 4(y,</)A Lcb L4nC(</, A)] , 
where 

^(y, I/O = 9" [c + dA + Z?(y, y')] , 2/0 = + ^ + ^(y, j/)] . 



Since ip is a positive density, the application defined by (|3ip is a LD-set 
function. As in Section [37TI assumptions (HI]) and (Ej2]) are satisfied. Assume 
now that the process Pj-*}fe>o is generated by the following non-linear state- 
space observations 



(46) 



x^nx^+a, 

Y* = h*(X*) + sl, 



where {£%}k>o is a sequence of i.i.d random variables, /* is a*-Lipschtiz, h* 
is surjective and for all x±, xi G X, 

|aq - x 2 \ <b* + b*\h*{xi) - h*(x 2 )\ , 
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for some positive constants 6q, b* . For all integer k > 1 , £j£ is independent 
of e£ and, for all A G X, 

nCk G =x) =y cf (x,u)l A ( U ) A Lcb (dn) . 

There exists probability densities ip* and positive constants //* , such 
that, for all x, u G X, 

(47) < < . 
We assume that 

(01) /* and h* are such that ||/ — f*\\ 00 < oo and \\h — < oo. 

Lemma 10. Let {Y k *} k >o be the sequence following (j46l) . Under (01), 
for all integer k > 1, 

D(YU, Yk) < K + 2 **b* + + b*\e%\ + ICfcl , 

where 

K=\\f-n\ oo + (b + b\\h*-h\U(l + a*) 

Proof of Lemma [101 For all integer k > 1, for z, z' G X such that 
fr(js) = Y£_ v h{z') = Y* and for u, v! G X such that /i*(u) = Y k *_ v h*{u') = 
Y k * , it follows by the triangle inequality that 

\f{ Z )-z'\ < \f( Z )-f*( Z )\ + \f*( Z )-f*( U )\ + \f*( U )- U >\ + \ U >- Z >\, 

(48) < \\f-f*\\ oo + a *\z-u\ + \f*(u)-u'\ + \u'-z'\. 
Let us notice that 

\z-u\<b + b\h(z) - h(u)\ <b + b \h(z) - h*(u)\ +b\h*{u) - h(u)\ . 

S v ' 

=0 

Then, by denoting K = bo + b \\h* — h\\ , it follows that \z — u\ < K and, 
for the same reasons, \z' — u'\ < K. Combining these two majorations with 
([4"5]) leads to 

\f{ Z )-z'\ < K+\r{u)-r{x k ^)\ + \r{x k ^)-x k \ + \x k -u'\, 

< K + a *[b* + b*\h*(z) - fc*(X fc _i)|] + \Q\ + b* + b*\h*(X k ) - h 

where k = \\f — f*\\ OQ + K(l + a*). Thus, it is proven that, for all integer 
k > 1, 

D(Y k U,Y k *) < K' + 2a*b* + oV^il + &*I4I + 14*1 • 

□ 
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Let us define for all A > 

(49) = logq~ [c + dA + K + 2a*b* + a*b*\e* \ + b*\el\ + \C + \] , 
where is a random variable independent of {£fc}fc>o with density ifj*. 

Proposition 11. Let us consider the filtering model defined by (|29p and 
satisfying fiOP; and (G). Let C be the LD- set function defined by (I3ip 

and let {^.*}fc>o be the sequence following ()46|) such that (01) holds and, 
for all A > 0, ~ 

(50) E(|V^ A jlog + |V^ A |) < oo . 

Then, for any initial probability distributions v and v' on (X, X) and A > 
satisfying 

vQ1q(y*A) > , P* - a.s. ^'Qlc(Yi*,A) > ' F * ~ a - s - 
we have 

limsupn" 1 log ||^, n [^o*n] ~ ^,nKJ IItv < °' P * _ a - s - 

n— >oo 

This proposition has important consequences. Observations issued from 
equations (|29|) under conditions (JH1), (EEJ) and (G) are of the observa- 
tions produced by (|16|) under (01). It is only needed that ||/ — f*\\ OQ and 
\\h — /^Uoo are bounded to ensure the w.p.l convergence. 

Let us write C, k = g*{X k _ l ,A* k ) where g* denotes a measurable function 
and {A k }k>o a sequence of i.i.d. random variables with uniform law on (0, 1). 
We make the following assumptions 

(03) there exists a measurable function g+ such that, for all x £ X and 
a£ (0,1), \g*(x,a)\ <g* + (a); 

(04) Let {Z^ A }fc> be the sequence defined by, for all A > and for all 
integer k > 1, 

Z* k A = - log q~ [c + dA + k + 2a*b* + a*b* + b*\e%\ + g* + (U* k )] , 

For all A > 0, there exists r > such that the log- moment generating 
function \I> z *a is finite on [0, r). 

Proposition 12. Let us consider the filtering model defined by (j29|) and 
satisfying (2JZpj (3l|) cmd fG^. Let {Y k *}k>o be the sequence following (j46l) 
smc/j i/iaf fO-Zj, ^05) and (04) hold and let C be the LD-set function defined 
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by (|3Tj) . Then, for v and v 1 two probability measures on (X, X) and A > 
such that, for some A > 0, 

E* {exp (\[logvg(;Y *)Qt c( Y* A )}-)} < oo , 

E*{exp {x[logu'g(;Y *)Qt c{Y * A) )}-)} < oo , 

we have 

limsupn MogE* [||0i>,n[^O*rJ ~~ ^^"[^O-tJIItV 

< . 

n— >oo 

For the convergence in expectation, the restrictive assumption (03) has 
to be made. Let us precise that, for the case considered in [3], this condition 
is satisfied since the function a in (fl2j) is bounded. The case of [15] is not 
covered by this condition. It seems quite difficult to get the same results as in 
|15j with observations not necessarily from an HMM without strengthening 
the assumptions on {Q}k>o- Let us precise that the convergence theorem 
of pj)] is proved for observations issued from the filtering equations. The 
assumption (04) is of the same type as (f4*0|h 

Proofs of Propositions [IT] and [12] are given in Section [6] 

4. Proofs of Propositions [2] and [3] 

Proof of Proposition [2] For convenience, we write Q = C(y«), = 
£c(^-i>^)' e t = £c(^-i'^) 5 9i(x) = g(x,yi), Xi = X yi _ ljVi and pi = 
1 — (e~ /ef) 2 . Let us define Xi = f Xi <g) Aj. Since C is an LD-set function, for 
all % = 1, . . . , n, x G Q_i, and / a non-negative function on X x X, 

(51) (e-fMlcJ) < Q(x, tcj) < (effHtcJ) ■ 

Let us define the sequence of unnormalized kernels Q® and Q\ by, for all 
i6X 2 , and / a non- negative function on X x X, 

Q?(5,/) = (£-) 2 lQ_ 1 A i (lQ/), 
Qj(xJ) = Q{xJ) - {eTft-c^M^cJ) ■ 

It follows from (jSTj) that, for all x in Q_i, < Qj(x, 1q f) < PiQ{x, 1q/) 
which implies that, for all x € X 2 , 

Qi&f) = lQ_ 1 ^)Q l H^lc/) + lQ_ 1 (^)OK^lq/l + lq_ 1 (^)O l 1 (^/) , 

< (Hl^&Qix, tcj) + tc^)QKx, l C c/) + Icc ^x)^ 1 ^,/") , 
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We write A n {v, i/, y , n ) = sup AeX \A n (A)\, where 

A n (A) d = v ® v'(goQgi . . . Qg n ^Axx) - v 1 ® v(g Qg~i . . . Qg n ^Axx) ■ 
We decompose A n (A) into A n (A) = Et 0:n _ie{o,i}™ A(A,i :n-i), where 

A n (A, to. n -l) = V ® v'{goQo9l ■ ■ ■ Qn~t9nlAxx) 

- v <g) u(g Q Q^gi . . . Qn-i9ntAxx) ■ 

Note that, for any io:n-i G {0, l} n and any sets A,B G X, 

v ® u'igoQ^gi . . . Q^^Iaxb) = ^ ® f(so<3S Si • • • Q%Zig n lBxA) ■ 
If there is an index i G {0, . . . , n — 1} such that ti = 0, then 

v ® ^'(5oOo°5i • ■ ■ Qn-iffnlAxx) 
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'(ffoO^l ■ • • 5^) >< (^ + l) 2 A i (lc i+1 5i + lOIV + l 1 ■ ■ ■ Qn-l9nlA 



IxX) 



' ® Kso<%°Si • • • Qtl9ilcJ x K" + i) 2 Ai(lQ +1 ^+iO- l + + i ■ ■ ■ Ol-^Jixx) 

Thus, A n (A, to:n-l) = except if for alH G {0, . . . ,n — 1}, ij = 1, and we 
obtain 

A n (A) = v®v' goQlgi . . . Q*_i#n(lAxx - lx> 



CxAj 



It then follows 

A n (y, z/', y 0:n ) < v®v'{goQlgi ■ ■ ■ Qn-l5n) < E?^/ 
with <5j = 1q_ ixCi (X;_i,JQ). 



ff(^o,yo) n^( J i-K)Pi 



i=l 



□ 



Proof of Proposition [31 Since C is an LD-set function, there exist 
some applications Eq, £q such that, for alii = 1, . . . ,n, for all x G C(yj_i) 
and for all A G X with A C C(j/j), 

(52) < < e ( t(y i _i,y i )Aj /i _ li j /i (^) . 

Let us write the obvious inequality 



Ef? 



g(X ,yo) Y[g(Xi,yi)l C{yi) (Xi) 
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Then, for the right-hand side of this expression, by (152p we have 



Ef? 



g(X ,yo)Y[g(X i ,y i )l C (y.)(Xi) 
Ej? g(X ,y )g(X 1 ,y 1 )t c{yi) (X 1 )'[[g(Xi,y i )l c{yi _ l)xC(yi) (X i _ 1 ,X i 

i=2 

n 

> v[g(;yo)Qg(-,yi)tc( yi )(-)] W^iyi-^y^K-i^iai^yi) 1 ^)] ■ 

i=2 

□ 

5. Proofs of Propositions [7] and [9] . 

PROOF of Proposition [71 Since, by definition ([34"j) . 7" is a decreasing 
function, the inequality (|37p leads to 



(53) n- 1 £ logeX(n_i, K fc ) > -n~ l £ Z, A , 

fc=2 k=2 

where is defined in (f38j) . Since the process {a6|efc— 1 1 + ICfcl + ^l £ fc|}fe>i 
is stationary 2-dependent, the strong law of large numbers for m-dependent 
sequences and the integrability condition (f39l) yield 

n 

(54) lim n~ l ^ z£ = E(Zf ) < 00 , P* - a.s. 

k=2 

By combining (|53|) and (|54j) , the first condition (|19p of Theorem O is sat- 
isfied. By assumption (B2|), the density v is bounded which implies that 
sup J/GY ^x(y) < supu. Hence, the second condition (f2"Uj) of Theorem is 
satisfied. We now consider the third condition (|2ip . Since the measure ap- 
pearing in the definition of the LD-set function does not depend on y, y' , 
the function (y,y') ^ ^c{y'^){y-,y')-, defined in ([IT]), does not depend on y 
and is given by 

*C (2 /,A)(y,2/) = / ^[y , -/ i (x)]A Lcb (^) > A Lcb [C(y',A)] x inf «( S ) . 

Jc(y>,A) M< A 

Since the function h is uniformly continuous, for any fixed A > 0, there 
exist 5 > such that, for all x, x' € X satisfying |a; — x'\ < (5, we have 
|/i(x) — /i(x')| < A, showing that A Leb [C(y', A)] > 5. Thus, we have, for all 

y, y' e y, 

(55) *c(3/',A)(y,y') > qa , 
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for some qa > 0, depending only on A. The third condition (1211) of Theorem 
[5] follows. Since assumption (E(2]) is satisfied, for any fixed 7] > 0, we choose 
A > such that inequality (j36l) holds. Let us write 



(56) R A (x) = log [1 - ( 7 -/ 7 + ) 2 (2c + dA + x)] . 

We will repeatedly use the following representation of the so-called L-statistic 
(see pH Chapter 8]): 

Lemma 13. Let {U\, . . . , U n } be a sequence and U nt \ < U n> 2 < . . . < U n ^ n 
the upper ordered statistic. Then, 



n „i 

n- 1 J2 U n,k= / F-^ds 
k=j J H n 

where F~y(s) *== inf{t E M, F n: u(t) > s} is the empirical quantile function, 
i.e. the generalized inverse of the empirical distribution function F n: u{t) = f 
n _1 Efc=i% h <t}- 

Applying this representation yields 

(57) n- 1 logA 7? (y 0: n,a) < C t{u > 1 - r n }F-\u) du , 

Jo 

wherer n = {\na\ -l)/n, F n (t) = n" 1 ££ =1 l{-RA(a&|£fc-i| + |Cfcl+&M) < t} 
and F~ l its generalized inverse. The function i?A defined by (|56f) is negative 
and then, F n (0) = 1 which implies that F" 1 ^) > for all u 6 (0, 1). Thus, 
by Fatou's lemma, 

(58) limsup / l{u > 1 - r^F" 1 ^) 

n— >oo JO 

< / limsup l{u > 1 - r^jF" 1 ^) P* - a.s. 

JO n— >oo 

The following lemma is a generalization of [181 Lemma 21.2]. 

Lemma 14. Let {^ n }n>o be a sequence of nondecreasing functions and ^> 
a bounded nondecreasing function such that for all ieX, lim n ^ 00 ^ n (x) = 
^(x). Then, has at most a countable number of discontinuity points 

and at any point u where is continuous, 



lim ^ n l {u) = ^~ 
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Let us denote F(t) = F[RA{ab\eo\ + + b\ei\) < t] and notice that 
F(Q) = 1. Then, combining (|57p . (|58p and Lemma Q3] leads to 



limsupn 1 log A ri (Yo- n , a) < / F (u)du<0, 



k a.s. . 

This shows that the fourth condition (|22p is satisfied and finally, Theorem 
[5] applies. 

□ 

Let us recall that ipz denotes the log-moment generating function of the 
random variable Z defined by Y>z(A) == logE[e A2 ] and we define its Legen- 
dre's transformation by 

ipzi x ) = sup{xA - i>z{>)} ■ 

A>0 

Proof of Proposition O We start by giving an exponential inequality 
for m-dependent variables. 

Lemma 15. Let {Z^^o be a sequence of m-dependent stationary ran- 
dom variables. There exists some constant C > such that, for all M > 0, 

P (j2 Z k > M nj < Cexp[-n^ i (2Mm)/(2m)] . 

The proof is elementary and left to the reader. It follows by equation 
that 



n 

k=2 / \k=2 



- 1 log eZ(lfc-i, Y k ) < -M x n < p [J2 z£ > Mm 



Thanks to (fiO"]) . by applying Lemma[T5l there exist some constant c\, 5\ > 
such that ri(n) < c\e~ Sin . Since v is bounded, we can choose M2 large 
enough such that r2(n) = 0. By ([55]) . for all (y, y') G Y 2 , \Pc(y,A)(2/>2/ / ) > 
qa , for some £>a > 0. Then, by choosing M3 large enough, we have 7*3(71) = 0. 
For 7*4(71), we need an exponential inequality for L-statistics based on in- 
dependent variables. 

Lemma 16. Let {Uk}k>o be a sequence of m-dependent stationary neg- 
ative random variables. For all a € (0,1), there exists a real r > such 
that 

/ n \ 

lim n 1 log P U n k > -rn < . 

n— >oo \ ^— ' / 

\k=n— \an~\+l ) 
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Proof of Lemma [TBI For j G {1, ... , m}, define Ij = {j,j + m,j + 
2m, . . .} and let rij = \Ij\ the cardinal of Ij. For any j G {1, .. . , m}, the 

sequence {Uk,k G /_,•} is i.i.d.. Denote {C/j^ }i<fc<n.; the sequence {Uk,k G 
/,}. Since Uk < for all integer fc, it then follows that 



E 0»,*<E E ^ 



k=n-\an]+l j=l k=(nj-[an]+l)V0 

and 



E ^ > -m < E P E U n-,k > -™/rn , 



rU) 

K k=n-\an]+l ) j=l \k=(rij-\an]+l)VO 



for all n > N larger than some integer N. The sequence {U^ }i<k<rij is a 
sequence of i.i.d. random variable. Then, using [10\ Theorem 6.1], we have 

fim nf log P j nf ]T U^ k > -8 I < , 

\ k=(nj-[an]+l)V0 / 

for some positive 8 and the result follows since rij/n = 1/m + o(l). 

□ 

Define by [/& = Ra [ab\ek-i \ + ICfcl + &kfc|] f° r an integer fc > 1. By the 
definition (fT6j) of A^, 

n 

(59) n^logA^Fom,") < n" 1 ]T *7 n , fe . 

k=n~ \an~\ +1 

Then, by equation ([59]) and by applying Lemma [To] there exist some con- 
stants C4, <^4 > such that r±(n) < C4e~ Sin . Finally, under assumptions (El), 
(E2) and (flOl) . Theorem [6] applies and provides a geometric rate. 

□ 

6. Proofs of Propositions 1111 and 1121 . 

Proof of Proposition [TTJ Let us define, for all A > and for all 
integer k > 1, 

(60) V k * A = log q~ [c + dA + K + 2a*b* + a*b*\e* k _ 1 \ + b*\e* k \ + \(* k \] . 
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Using the definitions ([44]) , ([45]) of q~ and , Lemma [10] shows that 

n n 

(61) n" 1 £ £aK-i> Yk) > n~ l £ ^ ■ 

k=2 k=2 

Thus, to check (|19p . it suffices to control the asymptotic behavior of right- 
hand side of this inequality. We use the following result [114 Chapter 2, 
Section 6]. 

Lemma 17. Let us denote by {TC k } k>o a- filtration and consider a sequence 
{Uk}k>o of random variable adapted to {7~t k } k >o- Let us assume that there 
exists a random variable U such that E(|£7[log + \U\) < oo and F(\U k \ > 
x) < cF(\U\ > x) for all x > and some c > 0. Then 

n 

lim n~ l Y i [U k -E(U k \H k - 1 )] = 0, F - a.s. 

k=l 

Define the filtration {^} k >o where T* k = a ({X*} <j< k , {Q }o<j<k, {^}j>o 
Since q- defined in (|44p is non-increasing, there exists c > such that, for 
all x > 0, F(\V k * A \ > x) < c¥(\V^ A \ > x), where V£ A is defined in @g) . 
Hence, we may apply Lemma [T71 which yields, for any A > 0, 

n n 

(62) liminf n" 1 £ y fc * A = lim inf n" 1 £ E{F fc * A , - a.s. 

fc=2 k=2 

By (|47p . since for all x > 0, logx > — log_x, then, by the strong law of 

large numbers, 

(63) 

n 

liminf" n -1 VE{K* A K_i} > -E[H A (a*b*\£l\ +b*\e* 1 \)] , P* - a.s. 

k=2 

where H A (x) = fi* + x / log_ <?~ [c + (iA + k + 2a*6* + x + <iw. By 

([50]) . E[F A (a*6*|ej5| + 6*|el|)] < oo, it then follows by ([62]) and (j63l) 

that 

n n 

liminf n" 1 £ loge^(y fc *_ 1 , Y k *) > liminf n" 1 £ V k * A > -oo , P* - a.s. 

fc=2 k=2 



and the condition (|19p is satisfied. The proof of assumptions (|20p and (|21 j) 
can be checked as in Proposition [7] Since (HE]) is satisfied, for a fixed ^ > 0, 
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we choose A > such that Tc c (y,A)(,y) < V^xiu)- Applying Lemma [T3l 
yields 



71 



^logA^^a) < [\{l-r n <u}F*-\u)du 
Jo 



a.s. 



where r n = ([na] — l)/n and F* 1 is the generalized inverse of the distri- 
bution function: 

n 

(64) F*(t) = n" 1 J2 HRa[k + 2a%* + a*6*K_il + b*\e* k \ + < t} , 
fc=l 

with i?A is defined in (f56l) . For convenience, let us write G{e* k _ l , e* k , ££) = 
R a [k + 2a*b* + a*b*\e* k ^\ + b*\e* k \ + Setting 



(65) 



= n- 1 P {G(4-i, 4, Cfc) < OT-i} 
fc=l 



it follows from Lemma [171 that, for a fixed £ 6 R, 
(66) Jim{F n *(t)-ij;(f)} = 0, 1 

The convergence in (|66|) may be shown to hold uniformly in t: 



a.s. 



Lemma 18. Let us consider the stochastic functions F* and H* defined 
by dMD, (USD- Then, 



a.s. 



(67) 

PROOF. Let us define 

n , 

J* n {t) = n- 1 Y, j P {G(4-i,4,^) < t \FU}r(w)dw 



(69) J*(t)=E 



l{G(eo, £*,«;) < t)i)*{w)dw 



By the Glivenko-Cantelli Theorem, hin^^oo || J* - J*||oo = 0, P*-a.s. Set 
e > and a sequence — oo = io<£i---<^A r = o° such that J*{tj) — 
J*{ti-{) < e/ (J,*, for every i. By (|47p . for all real numbers i < i', P*-a.s. 

H*(t')-H*(t) = n- 1 ^ nt < G(4_ l5 4,a) < t'l^o < »*+mt')-jm 

k=l 
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and then 

limsup \H*(t') - H*{t)\ < fJ,* + \J*(t') - J*{t)\ , P* - a.s. 

n— >oo 

For all ( £ 1, there exists an index i such that < t < Since F* and 
H* are increasing functions, it follows that 

K(U-i) < Kit) < F* n {tr) , KiU-x) < Kit) < H* n (tr) . 
These inequalities imply 

sup \F*(t)-H*(t)\ < ma* ■ \F*(tr)-H*(tr)\ + m^x \H* n (tr) - H* n (t t ^)\ , 

tgR 0<i<N l<i<N 

and then 

limsup sup \F*(t) - H*(t)\ < e , P* — a.s. 

n— >oo teM 

□ 

By gZD, for all t G R, 

= - + Kit) > F n *(i) - Kit) + fit J* n (t) , P* - a.s. 

Hence, using the limit (|67p . for a given <5 > 0, there exists an integer I such 
that, for all n > I and t E R, 

(70) F*(t) > »*_J*it) - 5 , P*-a.s. 

Let us notice that J* is an increasing function with limt_^_oo Jnit) = an d 
linit^+oo Jnit) = 1- Then, we can define its generalized inverse denoted by 
J*" 1 . By (EOD, it follows that, for all u E [0, - S) A 0], 



a.s. 



By choosing 5 > such that fi*L — 5 > 1 — a, there exists an integer i > / 
such that, for all n > i, we have 

/ 1{1 - r n < it}F* _1 (u)du 
Jo 

</ 1{1 -r n < u < //* -5}J* _1 [(u + 5)/^](iw , 1 
Jo 



a.s. 
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a.s. 



By Fatou's lemma, 

limsup / 1{1 — r n < u}F* _1 (u) du 

n— >oo JO 
rl 

< ]imswpt{l-r n <u<n*_-5}J*~ 1 [(u + 5)/fi*_}du, I 

JO n— >oo 

It follows by Lemma [T4"l that 

limsupn _1 logA^(y * n ,a) < / J*~ l [(u + 5)/n*_] du < , P* - a.s. 
Thus, condition ([22]) is satisfied and Theorem [5] applies. 

□ 

Proof of Proposition [T2l It follows, by definition of tt, Lemma [TU] 
and (03), that 

n(n) = P* ( n" 1 £ log 9 -[c + dA + D(F fe *_ l5 y fc *)] < -Mm < 

\ k=2 ) 

fJ n~ l log <T [c + a*b*\e* k ^ \ + b*\e* k \ + g* + (A* k )} <-mJ . 

V k=2 / 



with Co = c + dA + k + 2a*b*. Then, by (04) and applying Lemma [T5l there 
exist some constants c±, Si > such that r\(n) < cie -<5in . By the same 
arguments as in proof of Proposition [91 the real numbers M2 and M3 can 
be chosen large enough such that r2(n) = and r^{n) = 0. Let us denote by 
{ U kh>o the sequence defined by U£ = R A [K + 2a*b* + a*6*|e^_ 1 | + + 
g^(yl^)], for all integer k > 1. By definition of A„, 

n 

(71) ^logA^^a)^- 1 J] 

k=n— [an] +1 

By applying Lemma [T6l there exist some constants 04,^4 > such that 
?"4(w) < C4e -54 ™. Finally, Theorem [6] applies and provides a geometric rate. 

□ 
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